Evaluating Feature Selection Methods for Multi-Label Text Classication
نویسندگان
چکیده
Multi-label text classification deals with problems in which each document is associated with a subset of categories. These documents often consist of a large number of words, which can hinder the performance of learning algorithms. Feature selection is a popular task to find representative words and remove unimportant ones, which could speed up learning and even improve learning performance. This work evaluates eight feature selection algorithms in text benchmark datasets. The best algorithms are subsequently compared with random feature selection and classifiers built using all features. Results agree with literature by finding that well-known approaches, such as maximum chi-squared scoring across all labels, are good choices to reduce text dimensionality while reaching competitive multi-label classification performance.
منابع مشابه
Evaluating Feature Selection Methods for Multi-Label Text Classification
Multi-label text classification deals with problems in which each document is associated with a subset of categories. These documents often consist of a large number of words, which can hinder the performance of learning algorithms. Feature selection is a popular task to find representative words and remove unimportant ones, which could speed up learning and even improve learning performance. T...
متن کاملMLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection
Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...
متن کاملA Multi-label Text Classification Framework: Using Supervised and Unsupervised Feature Selection Strategy
Text classification, the task of metadata to documents, requires significant time and effort when performed by humans. Moreover, with online-generated content explosively growing, it becomes a challenge for manually annotating with large scale and unstructured data. Currently, lots of state-or-art text mining methods have been applied to classification process, many of them based on the key wor...
متن کاملAccuracy Based Feature Ranking Metric for Multi-Label Text Classification
In many application domains, such as machine learning, scene and video classification, data mining, medical diagnosis and machine vision, instances belong to more than one categories. Feature selection in single label text classification is used to reduce the dimensionality of datasets by filtering out irrelevant and redundant features. The process of dimensionality reduction in multi-label cla...
متن کاملA theoretical framework for evaluating forward feature selection methods based on mutual information
Feature selection problems arise in a variety of applications, such as microarray analysis, clinical prediction, text categorization, image classification and face recognition, multi-label learning, and classification of internet traffic. Among the various classes of methods, forward feature selection methods based on mutual information have become very popular and are widely used in practice. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013